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1. INTRODUCTION 

Social networking is a forum where human beings can disseminate information, discuss, exchange 
ideas, pour out their hearts and share activities carried out in cyberspace by being delivered via the internet 
network [1], [2]. Various social networks have their respective advantages, such as uploading photos, text, 
videos and voice recording that are able to interact completely and personally [3]. All ages and groups can 
conduct discussions on social networks without differences in human degrees [4]. The government can use 
the advantages and benefits of social networking to interact with the community in conveying information 
and receiving complaints from the public [5]. 

As a result of freedom of opinion in social networks, every government in various countries makes 
policies so that there are no unpleasant actions for social network users [6]. In addition, the government in 
making policies for a country can be broadcast and disseminated through social networks so that delivery can 
be done quickly [7]. With this, we can reap the pros and cons of policies made by the government because 
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they are networked by the social community, they can give opinions without any restrictions by using social 
networks. 

With the pros and cons of government policies made, social networking cannot resist the accelerated 
rate of giving opinions made on social networks [8]. So that arises encouragement from various groups and 
individuals on social networking [9]. This accelerates the delivery of bladder to become cyberbullying, where 
cyberbullying is an abuse of the use of the internet network by harassing, humiliating, threatening, and 
insulting others with digital trace recording [10]. 

Cyberbullying in various literacy which is embodied by LaFrancis and Putnam [11] explained that 
cyberbullying in the teenage group, the role of parents is needed to avoid raw failure in the real world. 
Meanwhile, according to Jiang et al. [12] it is necessary to have a government policy in determining the 
regulations for using cyberbullying. From the literacy, cyberbullying needs to be limited and detected using 
computer science methods. In accordance with the science of computer science, there are methods for 
classifying and clustering with results to detect certain events [13]-[15]. This knowledge is contained in data 
mining in combination with artificial neural networks [16], [17]. 

The speed of increasing data on cyberbullying on social networks cannot be stopped manually [18], 
[19]. So, the need for a method to be able to detect the outpouring of cyberbullying on social networks [20]. 
Potha and Maragoudakis [21] performs cyberbullying to detect and search for cyber predators using the time 
series method where each predator question is manually explained in terms of severity using a numeric label 
and applies the dynamic time warping algorithm, so that similarities in the signals are evident, providing a 
direct indicator for the severity of cyber bullying in this particular dialogue. Meanwhile, Nahar et al. [22] 
with a data mining approach in detecting cyberbullying with stochastic data and explaining that support vector 
machine (SVM) is an algorithm with supervised learning where the process must first have a label with fuzzy so 
that it is continued with the SVM algorithm so that the data becomes complex and multi-dimensional and this 
method is called fuzzy SVM. 

Andriansyah et al. [23] classified the comment column on Instagram on artist accounts which are 
often referred to as celebrities in detecting cyberbullying where there were 1053 comments and 34 inspection 
documents containing cyberbullying of 79.412%. Meanwhile, Noviantho et al. [24] identify cyberbullying 
conversations with a combination of SVM and Naïve Bayes with an accuracy of 92.81% but the algorithm is 
modified with a poly kernel and achieves optimization at an accuracy of 97.11%. From various studies before 
doing cyberbullying, they often use SVM so that in this paper a study is designed to detect cyberbullying 
with data that is tested for cyberbullying on government policies using SVM which is focused on the method, 
namely using the kernel function to get optimal results using the cubic kernel function. 


2. MATERIAL AND METHOD 
2.1. Dataset 

The dataset in this paper uses trending data on twitter on government policies regarding the 
determination of “cipta kerja” which uses the pros and cons so that cyberbullying is identified in every tweet 
posted by the Twitter social network user. Through keywords using “cipta kerja”, the activities are sorted so 
that the data is crawling and then cyberbullying is detected in government policy. The amount of data 
achieved is 2400 tweets. 


2.2. General architecture 

This paper is the result of research that focuses on the SVM algorithm that was developed. Where 
the kernel used is a linear kernel but compared to the use of other kernels such as cubic kernels. So, the 
contribution in this paper is to produce optimal accuracy values in detecting. So, a general architecture was 
formed in this research so that it does not spread. The general architecture in this paper can be illustrated in 
Figure 1. The explanation in Figure 1 is loaded according to the following steps: 
— Perform data search with the keyword “cipta kerja” on twitter. 
— Crawling data. 
— Preprocessing data. 
— Conduct detection training. 
— Classifying with SVM using the linear kernel function. 
— Classifying with SVM using the cubic kernel function. 
— Analysis of the success in obtaining optimization from detection using SVM. 
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Figure 1. General architecture 


2.3. Optimization of SVM 

SVM is an algorithm that is relatively new and is able to solve regression and classification problems 
on very large amounts of data [25], [26]. In carrying out learning, of course SVM uses supervised learning 
techniques with supervision and then enters the testing stage [27], [28]. In the process, SVM also performs a 
linear projection from the feature space to the kernel and creates separate and linear classes [29], [30]. 

The SVM algorithm also has a very significant advantage in optimization, namely by achieving a 
minimum value in global and local problems. So that SVM is often used in cases to classify, detect, estimate, 
and predict [27]. In its history, SVM was discovered by Vapnik in 1963 which was used in doing 
classification problems [31], [32]. SVM can also basically classify data with high dimensional non-linearity 
[33]. With strong learning, SVM is widely applied and applied to the real world with accurate computational 
techniques [29], [34]. SSVM uses supervised learning techniques which are often found in biophotonics, 
image detection, classification, estimation, and prediction problems [35]. SVM is calculated based on the 
hyperplane as in the (1) [36], [37]: 


Wx (x) +b=0 (1) 


Where W is the normal value of the hyperplane, (x) is a function of the input vector, and b is the bias value. 
Then from these calculations an optimization is carried out in order to get the minimum value based on (2) [35]: 


i 1 
minh Ca) = An Di yoy K (xix) aiaj — Yin ai (2) 
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Where x; is the input vector value, y; is the n class correspondence value, (œ) is the lagrange 

multiplier value, and K is the linear kernel and cubic kernel function. In the combination of parameters must 

be determined in advance in getting it. Thus, SVM can be grouped into linear and nonlinear. As seen in (3): 


(x) = Eloa yx]. x + Bo (3) 


Where x; is the value on the label y;, a is the value to be multiplied by lagrange and fp is the bias 
value of 0. In non-linear problems the equation is changed based on (4) [38]. 


FO = Vico aiyiK (xix). x + Bo (4) 


Where the value of N is the value of the number of input vectors. 


3. RESULTS AND DISCUSSION 

At this stage, the results of applying sentiment analysis to data originating from social media with 
the keyword “cipta kerja” will be presented using the SVM algorithm with kernel cubic and linear kernel. 
This study will use as much as 2400 data originating from social media twitter. The data is obtained from the 
crawling process on twitter social media then the text preprocessing stage is carried out to change 
unstructured data into structured data which will then be carried out manually labeling, in data labeling there 
will be positive, negative and neutral labels which will then be carried out sentiment analysis with an SVM 
algorithm. Following are the results of the sentiment analysis process using the SVM algorithm with cubic 
kernels and linear kernels. 


3.1. Text preprocessing results 

The result of the text preprocessing process is tweet data with the keyword “cipta kerja” which has 
been changed and cleaned of irrelevant characters. In the text preprocessing there are 4 stages including case 
folding, tokenization, stopword and stemming. However, in this study only case folding, tokenization and 
stopword stages were carried out because the resulting data does not require a stemming process. The 
following data is the result of text preprocessing for each stage. 
a. Case folding 

Case folding is a text preprocessing process that changes the entire contents of tweets to lowercase 
with the aim of making text data analysis able. Where this research uses text data originating from Twitter 
and case folding will be carried out. Following are the results of case folding on the dataset contained in 
Table 1. 


Table 1. The case folding process 
No Tweet 
1 RT @YLBHI: Rezim pemerintahan saat ini banyak 


Result of case folding 
Rezim pemerintahan saat ini banyak 


memunculkan kebijakan yang menyengsarakan rakyat 
dan hanya mementingkan penguasa. Hukum diper... 

RT @ChaUnk_VR1: UU Cipta kerja telah ditolak oleh 
berbagai lapisan masyarakat sejak belum disahkan dan 
mengakibatkan gelombang PHK yang uga... 


RT @atr bpn: Halo #SobATRBPN, Kementerian 
ATR/BPN melalui PERPU Nomor 2 Tahun 2022 tentang 
Cipta kerja, akan mengatur pemanfaatan Hak Peng... 

RT @vita AVP: Agar tercapai pemahaman yang sinkron 
antar pemangku kepentingan, Kominfo terus melakukan 
sosialisasi UU No.2/2020 Tentang Cip... 


memunculkan kebijakan yang menyengsarakan 
rakyat dan hanya mementingkan diper 

UU cipta kerja telah ditolak oleh berbagai 
lapisan masyarakat sejak belum disahkan dan 
mengakibatkan gelombang phk yang uga 


Halo sobatrbpn kementerian atrbpn melalui 
perpu nomor 2 tahun 2022 tentang cipta kerja 
akan mengatur pemanfaatan hak peng 

agar tercapai pemahaman yang sinkron antar 
pemangku kepentingan  kominfo terus 
melakukan sosialisasi uu no22020 tentang cip 


b.  Tokenization 

Tokenization is a process carried out to separate tweet data into separate words or tokens. In this 
case, the data uses text originating from Twitter and has been casefolded and then tokenized. Following are 
the results of the tokenization of the dataset contained in Table 2. 
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Table 2. The tokenization process 


Tweet 


Tokenization results 


1 Rezim pemerintahan saat ini banyak 
memunculkan kebijakan yang menyengsarakan 
rakyat dan hanya mementingkan diper 

2 UU cipta kerja telah ditolak oleh berbagai 
lapisan masyarakat sejak belum disahkan dan 
mengakibatkan gelombang phk yang uga 


‘rezim’, ‘pemerintahan’, ‘saat’, ‘ini’, ‘banyak’, 
‘memunculkan’, ‘kebijakan’, ‘yang’, ‘menyengsarakan’, 
‘rakyat’, ‘dan’, ‘hanya’, ‘mementingkan’, ‘diper’ 

‘uu’, ‘cipta’, ‘kerja’, ‘telah’, ‘ditolak’, ‘oleh’, ‘berbagai’, 
‘lapisan’, ‘masyarakat’, ‘sejak’, ‘belum’, ‘disahkan’, ‘dan’, 
‘mengakibatkan’, ‘gelombang’, ‘phk’, ‘yang’, ‘uga’ 


2399 Halo sobatrbpn kementerian atrbpn melalui ‘halo’, ‘sobatrbpn’, ‘kementerian’, ‘atrbpn’, ‘melalui’, 
perpu nomor 2 tahun 2022 tentang cipta kerja ‘perpu’, ‘nomor’, ‘tahun’, ‘tentang’, ‘cipta’, ‘kerja’, ‘akan’, 
akan mengatur pemanfaatan hak peng ‘mengatur’, ‘pemanfaatan’, ‘hak’, ‘peng’ 

2400 Agar tercapai pemahaman yang sinkron antar ‘agar’, ‘tercapai’, ‘pemahaman’, ‘yang’, ‘sinkron’, ‘antar’, 
pemangku kepentingan kominfo terus “pemangku”, ‘kepentingan’, ‘kominfo’, ‘terus’, 
melakukan sosialisasi uu no22020 tentang cip ‘melakukan’, “sosialisasi”, ‘uu’, ‘no’, ‘tentang’, ‘cip’ 

Stopword 


Stopword is the process of removing words that have no meaning contained in the tweet data. By 
previously carrying out case folding and tokenization and then doing stop words. Following are the results of 
stopwords in the dataset contained in Table 3. 


Table 3. The stopword process 


No Tweet Stopword result 
1 Rezim pemerintahan saat ini banyak memunculkan Rezim pemerintahan saat ini banyak 
kebijakan yang menyengsarakan rakyat dan hanya memunculkan kebijakan yang menyengsarakan 
mementingkan diper rakyat dan hanya mementingkan 
2 UU cipta kerja telah ditolak oleh berbagai lapisan uu cipta kerja telah ditolak oleh berbagai lapisan 
masyarakat sejak belum disahkan dan mengakibatkan masyarakat sejak belum disahkan dan 
gelombang phk yang uga mengakibatkan gelombang phk 
2399 Halo sobatrbpn kementerian atrbpn melalui perpu Halo sobat kementerian atrbpn melalui perpu 
nomor 2 tahun 2022 tentang cipta kerja akan mengatur nomor 2 tahun 2022 tentang cipta kerja akan 
pemanfaatan hak peng mengatur pemanfaatan hak 
2400 Agar tercapai pemahaman yang sinkron antar Agar tercapai pemahaman yang sinkron antar 


pemangku kepentingan kominfo terus melakukan 


pemangku kepentingan kominfo terus melakukan 


sosialisasi uu no22020 tentang cip sosialisasi uu no22020 tentang 


3.2. Sentiment distribution 

Sentiment distribution on data to identify and map tweet data on positive, negative and neutral 

sentiments, the purpose of the distribution is to see opinions contained in tweet data originating from social 

media with the keyword “cipta kerja”. By distributing sentiment, you can find out the opinion patterns given 
by the community. The following is the sentiment distribution contained in Figure 2. 
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Figure 2. Sentiment distribution 
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Where Figure 2 is the distribution sentiment from a data warehouse in the form of data from Twitter. 
Information in Figure 2 contains data with neutral sentiment labels with a total of 1550, negative sentiment 
labels totaling 700 data and positive sentiment labels totaling 150 data out of the total data used totaling 
2400. From the distributed data, training and testing can then be carried out with the SVM algorithm. 


3.3. Test data sharing 

The division of test data is the process of breaking data into two stages, namely data training and 
data testing, data training will be used to train the sentiment analysis model using the SVM algorithm while 
data testing will be used to test the model that has been formed. The following is a visualization of the 
distribution of test data in Figure 3. Figure 3 explains that in the process of building a sentiment analysis 
model using the SVM algorithm, there is a division of training and testing data, in the training data the 
percentage is 70% of all data and 30% for data testing of all data. 


m Data trainning 


m Data testing 


Figure 3. Distribution of data sharing 


3.4. The results of the SVM model use the linear kernel 

In the results of applying sentiment analysis to the SVM model using a linear kernal, it can be seen 
that the sentiment analysis model produces performance evaluations such as accuracy, precision, recall and f1 
score. Following are the results of the SVM model using the linear kernel which can be seen from the results 
of the following confusion matrix in Figure 4. Description of Figure 4 shows the distribution of predictive 
data with a confusion matrix with positive, negative, and neutral values. The following is a Table 4 of manual 
confusion matrix calculations to find accuracy, recall, precision, and F1 score values. 


Table 4. Manual confusion matrix calculation table with linier kernel 
Actual Positive predictions Negative predictions Neutral prediction 


Positive 16 3 9 
Negatif 0 166 28 
Neutral 3 6 452 


Based on Table 4, the calculation of accuracy, precision, recall and F1 score will be carried out as 
follows: 
a. Accuracy 


Accuracy=———— 4) _ 199% = MISI) __ 190% = 92.3% 
(TP+TN+FP+FN+TN+FP+FN+TN+TP) (16+3+9+0+166+28+3+6+425) 
b. Precision 
Positive= TP = 16 = 0.842 
(TP+FP)  (16+3) 
3 F 166 
Negative= ONN ae 0.965 
Neutral= —“— = —**? = 0.924 


(TP+FP)  (452+28) 


TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 2, April 2024: 329-339 


TELKOMNIKA Telecommun Comput El Control Oo 335 


c. Recall 
Positives —~— = —*- = 0.640 
(TP+FN) — (16-49) 
Negative= ——_ = —$ _ — 0.855 
(TN+FP) — (166428) 
Neutral— = = 0.984 
(TP4FN) (452+3+6) 
d. FI score 
Positive= 2 y 8420640) _ 0,727 
(0.842 + 0.640) 
Negative= 2x (406510855) _ 0,907 
(0.965 + 0.855) 
Neutral= 2 x 2240284) 0,953 


(0.924 + 0.984) 


Linear - Confusion Matrix 
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Figure 4. Confusion matrix 


The receiver operating characteristic (ROC) curve aims to see the performance of the sentiment 
analysis model on tweet data with the keyword “cipta kerja”. The results of the ROC curve will produce a 
comparison graph between the positive class and other classes. Here are the results of the ROC curve 
contained in Figure 5. 


One-vs-Rest ROC curves: 
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Figure 5. Result of the ROC curve SVM model with a linear kernel 
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3.5. The output of the SVM Model uses a cubic kernel 

In the results of applying sentiment analysis to the SVM model using cubic kernel, it can be seen 
that the sentiment analysis model produces performance evaluations such as accuracy, precision, recall and f1 
score. The SVM algorithm process certainly does not go through the calculations available from (1)—(4). The 
following are the results of the SVM model using cubic kernels which can be seen from the results of the 
following confusion matrix in Figure 6. The description of Figure 6 shows the distribution of predictive data 
with a confusion matrix with positive, negative and neutral values. The following is a Table 5 of manual 
confusion matrix calculations to find accuracy, recall, precision and F1 score values. 
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Figure 6. Confusion matrix kernel cubic 


Table 5. Manual confusion matrix calculation table with cubic kernel 
Actual — Positive predictions Negative predictions — Neutral prediction 


Positive 11 0 ly 
Negative 0 158 36 
Neutral 0 1 460 


Based on Table 5, the calculation of accuracy, precision, recall and F1 score will be carried out as 
follows: 
a. Accuracy 


Accuracy= a x100% = Menata 100% = 90% 
(TP+TN+FP+FN+TN+FP+FN+TN+TP) (11+0+17+04+158+36+0+14+460) 
b. Precision 
Positive= ee E 
(TP+FP) — (1140) 
Negative— T = 8 099 
(TN+FN) (158+1) 
Neutral= = —*" __ 989 
(TP+FP) (460453) 
c. Recall 
Positive= —”——-— = 0,39 
(TP+FN) (11417) 
Negative= m= _ = 0,25 
(TN+FP) (12436) 
Neutral= —“— = —*_ = 0.99 
(TP+FN) (19441) 
d. Fl score 
Positive= 2 x C2% _ 0.56 
(1.0 + 0.3939) 
Negative= 2 x (4871023) _ 9 39 
(0.9937+ 0.25) 
Neutral= 2 x (02061109949) _ 9.94 


(0.8961+ 0.9949) 
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The ROC curve aims to see the performance of the sentiment analysis model on tweet data with the 
keyword job creation. The results of the ROC curve will produce a comparison graph between the positive 
class and other classes. Here are the results of the ROC curve in Figure 7. 


One-vs-Rest ROC curves: 
Positive Sentiment vs Rest 
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Figure 7. Result of the ROC curve SVM model with a cubic kernel 


4. CONCLUSION 

In this paper, cyberbullying has been detected on tweets from the word government policy in tweets. 
From the example of the word “cipta kerja” which fills comments from twitter users, it causes bullying so 
that cyberbullying is formed. However, the method used in detecting using SVM has been optimized on 
kernel cubic. The results show that the accuracy of the linear kernel is 92.3% while the cubic kernel function 
optimization gets an accuracy of 90%. ham shows optimal because the resulting accuracy is more optimal. 
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